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SUMMARY 

In  many  engineering,  economic,  biological,  and  statistical 
control  processes,  a  .<ieclslon-maklng  device  Is  called  upon  to 
perform  under  various  conditions  of  uncertainty  regarding 
underlying  physical  precesses.  These  conditions  range  from 
complete  knowledge  to  total  ignorance.  As  the  px*ocess  unfolds, 
additional  informiatlon  may  become  available  to  the  contixslling 
element,  which  then  has  the  possibility  of  "learning"'  to  im¬ 
prove  its  performance  based  upon  experience;  l.e.,  the  control¬ 
ling  element  may  adapt  itself  to  its  environment. 

On  a  grand  scale,  situations  of  this  type  occur  in  the 
development  of  physical  theories  through  the  mutual  interplay 
of  experimentation  and  theory;  on  a  smaller  scale  they  occur 
in  connection  with  the  design  of  learning  servomechanisms  and 
adaptive  filters. 

The  central  purpose  of  this  paper  is  to  lay  a  foundation 
for  the  mathematical  treatment  of  bix)ad  classes  of  such  adaptive 
processes .  This  is  accomplished  through  use  of  the  concepts 
of  dynamic  programming. 

Subsequent  papers  will  be  devoted  to  specific  applications 
in  different  fields  and  various  theoretical  extensions. 
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DYNAMIC  PROGRAMMING  AND  ADAPTIVE  PROCESSES— I: 
MATHEMATICAL  FOUNDATION 

Richard  Bellman 
Robert  Kalaba 


1.  Introduction  \  .  ^  . 

thl'S'^Daper  is  'biO*l-33r~  a  foundatlonAfor  a 
mathematical  theory  of  a  significant  class  of  decision  pro¬ 
cesses  which  have  not  as  yet  been  studied  In  any  generality. 
These  pix)Gesses,  which  nwil'lr'-'bc  described  In  some  detall>  belcwj 
-WO  'ahal-l  oat-l  adaptive . 

They  arise  In  practically  all  parts  of  statistical  study, 
practically  engulf  the  field  of  operations  research,  and  play 
a  paramount  role  in  the  current  theory  of  stochastic  control 
processes  of  electronic  and  mechanical  origin.  All  three  of 
these  domains  merge  In  the  consideration  of  the  problems  of 
communication  theory. 

Independently,  theories  governing  the  treatment  of  pro¬ 
cesses  of  this  nature  are  essential  for  the  understanding  and 
development  of  automata  and  of  machines  that  "learn." 

We  propose  to  Illustrate  how  the  theoiv  of  dynamic  pro¬ 
gramming,  [l]j  used  to  form.ulate  in  precise  terms  a 

number  of  the  comple^ and  vexing  questions  that  arise  in  these 
studies.  Furthermore,  the  functional  equation  approach  of 
dynaunic  programming  enables  us  to  treat  some  of  these  problems 
by  analytic  means,  and  to  resolve  others,  where  direct  analysis 
is  stymied,  by  comr;utatlonal  techniques. 


general  questions  are  treated  In  an  ab3tract/>n.^^Mi£-^, 
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fashlon.  In  subsequent  papers,  we  shall  apply  the  formal 
stmcture  erected  here  to  specific  applications. 

2.  Adaptive  pTOcesses 

We  wish  to  study  multi-stage  decision  processes,  and  pix>- 
cesses  which  can  be  construed  to  be  of  this  nature,  for  which 
we  do  not  possess  complete  information.  This  lack  of  infor¬ 
mation  takes  various  forms  of  which  the  following  are  typical. 

We  may  not  be  In  possession  of  the  entire  set  of  admissible 
decisions;  we  may  not  know  the  effects  of  these  decisions;  we 
may  not  be  aware  of  the  duration  of  the  processes  and  we  may 
not  even  know  the  over-all  purpose  of  the  process.  In  any 
number  of  processes  occurring  in  the  real  world,  these  are  some 
of  the  difficulties  we  face. 

The  basic  problem  is  that  of  making  decisions  on  the  basis 
of  the  Information  that  we  do  possess.  An  essential  part  of 
the  problem  Is  tJiat  of  using  this  accumulated  knowledge  to  gain 
further  Insight  into  the  structure  of  the  processes,  using 
analytic,  computational  and  experimental  techniques. 

PiK)m  this  Intuitive  description  of  the  types  of  problems 
that  we  wish  to  consider,  it  is  clear  that  we  are  impinging 
upon  some  of  the  fundamental  areas  of  scientific  research. 
Obvious  as  the  existence  of  these  problems  are,  it  is  not  at 
all  clear  how  questions  of  this  nature  can  be  formulated  in 
precise  terms. 

Particular  pixjcesses  of  this  type  have  been  treated  in  a 
number  of  sources,  such  as  the  works  on  sequential  analysis,  cf. 
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Wald,  j^l4j  ;  the  theory  of  games,  cf.  von  Neumann  and  Morgenstern, 
the  theory  of  multi-stage  games,  cf.  Bellman,  [l]  / 

Chapter  10;  and  the  papers  on  "learning  pa^ocesses"  of  Flood, 

[5]>.  [t>  ,  ,  Robbins,  [ll]>  Karlin  and  Johnson,  |^8j ,  Bellman, 

|2^,  Bellman  and  Kalaba,  [s]- 

3.  The  Unfolding  of  a  Physical  Pixacess 

In  order  to  appreciate  the  type  of  pixacess  we  wish  to  con¬ 
sider,  the  problems  we  shall  treat,  the  terminology  we  shall 
employ,  and  the  methods  we  shall  use,  it  is  essential  that  we 
discuss,  albeit  in  abstract  terms,  the  behavior  of  the  conven¬ 
tional  deterministic  physical  system. 

Let  a  system  S  be  described  at  any  time  t  by  a  state 
vector  p.  Let  be  a  sequence  of  times, 

tf  <  tg  <  • • • ,  at  which  the  system  is  subject  to  a  change  which 
manifests  itself  in  the  form  of  a  transformation.  At  time  t^, 

Pjj^  Is  converted  Into  Tj^(Pj^),  at  time  p^  =  ^i^^l^ 

converted  Into  T2(P2)>  ^ind  so  on,  with  the  result  that  the 
sequence  of  states  of  the  system  is  given  by  the  sequence  {pk}' 
where 


Pk+1  ~  k  -  1,2,...  . 


The  state  of  the  system  at  the  end  of  time  t^^  is  then 


given  by 


(2) 
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where  is  the  Initial  state  of  S. 

If  is  Independent  of  k,  which  is  to  say,  if  the 

same  transformation  is  applied  repeatedly,  then  the  preceding 
result  can  be  written  symbolically  in  the  form 

(3)  Pn+1  '  Apj.)- 

The  interpretation  of  the  behavior  of  a  physical  system 
over  time  as  the  iteration  of  a  transformation  was  inti\)duced 
by  Poincare,  and  extensively  studied  by  G.  D.  Birkhoff,  [^] > 
and  others.  It  furnishes  the  background  for  the  application  of 
modern  abstract  operator  theory  to  the  study  of  physical  sys¬ 
tems,  as,  for  example,  in  quantum  mechanics;  cf.  von  Neumann, 
1^12].  The  idea  of  using  this  fundamental  representation  in 
connection  with  the  formulation  of  the  ergodic  theorem  is  due 
to  B.  0.  Koopman. 

4.  Feedback  Control 

With  all  this  in  inlnd,  we  are  now  able  to  Introduce  the 
concept  of  feedback  control . 

Supposing  that  the  behavior  of  the  system  as  described  by 
the  foregoing  equations  is  not  satisfactory,  we  propose  to 
modify  it  by  changing  the  character  of  tiie  transformation 
acting  upon  p.  This  change  will  be  made  dependent  upon  the 
state  of  the  system  at  the  particular  time  the  transformation 
is  applied. 

In  order  to  indicate  the  fact  that  we  now  have  a  choice 
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of  transformations,  we  write  T(p,q)  In  place  of  p.  The 
variable  q  indicates  the  choice  that  Is  made.  Consequently, 
we  shall  call  it  the  control  variable,  as  opposed  to  p,  the 
state  variable .  To  simplify  the  notation  and  discussion,  we 
shall  assume  that  the  set  of  admissible  transformations  does 
not  vary  with  time. 

If  q^  denotes  the  choice  of  the  control  variable  at  time 
t^,  we  have,  in  place  of  (3.I),  the  relation 

( 1 )  ^k+1  ~  ^  ~  4*2,..., 

with  explicitly  determined  as  in  (3*2). 

The  associated  variational  problem  is  that  of  choosing 
qj_*Q2*  •  •  *  to  make  the  behavior  of  the  system  confonn 

as  closely  as  possible  to  some  preassigned  pattern.  We  wish, 
however,  to  do  more  than  leave  the  problem  in  this  vague  format. 

5.  Causality 

Turning  back,  for  the  moment,  to  the  deterministic,  uncon- 
ti^olled  process  discussed  In  ^3*  let  us  note  that  the  state  of 
the  system  at  time,  is  a  function  of  the  Initial  state  of 

the  system,  and  the  number  of  transfonnatlons  that  have  been 
applied.  Consequently,  we  may  write 

Pk+1  ^’k^^l^" 

where  p^  is  the  initial  state  of  the  system. 

For  the  sake  of  convenience,  let  us  merely  write  p  in 
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place  of  p^.  Then,  the  function  Is  easily  seen  to 

satisfy  the  basic  functional  equation 

This  Is  the  fundamental  semi-group  property  of  dynamical 
systems . 

6.  Optimality 

With  the  foregoing  as  a  guide,  let  us  see  if  we  can  formu¬ 
late  the  feedback  control  process  In  the  same  terms. 

To  illustrate  the  applicability  of  the  functional  equation 
technique,  let  us  consider  a  finite  pr\>cess,  of  N  stages, 
where  It  Is  desired  to  maximize  a  preassigned  function,  of 

the  final  state  of  the  system,  p^^.  This  is  often  called  a 
terminal  control  pixtcess. 

The  variational  pxx^blem  may  now  be  posed  in  the  following 
terms : 

(1)  Majc  ^(Pm)- 

This  maximum,  which  we  shall  assume  exists.  Is  again  a  function 
of  the  initial  state,  p,  and  the  duration  of  the  process. 

Let  us  then  introduce  the  function  defined  for  all  states  p 
and  N  =  1,2,...,  by  the  relation 

fj^(p)  =  Max  j^(Pj^), 

q 


(2) 
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where  q  represents  the  set  .  ,qj^  j  • 

Let  us  now  Introduce  some  additional  terminology.  A  set 
of  admissible  choices  of  the  q^^^,  j^q^^q^,  .  .  .  ^q^^j ,  will  be 

called  a  policy ;  a  policy  which  maximizes  ^(Pji+q)  will  be 
called  an  optimal  policy . 

In  order  to  obtain  a  functional  equation  corresponding  to 
(5* 2),  we  invoke  the 

PRINCIPLE  OF  OPTIMALITY.  An  optimal  policy  has  the 
pixaperty  that  whatever  the  initial  state  and  initial  decision 
are,  the  remaining  decisions  must  constitute  an  optimal  policy 
with  regard  to  the  state  resulting  from  the  first  decision. 

The  mathematical  transliteration  of  this  statement  is  the 
functional  relation 

(3)  ), 

N  =  2,3,..-,  with 

(^)  f'l(p)  =  Max  p(T(p,q^)). 

^1 

Further  discussion,  and  various  existence  and  uniqueness 
theorems  for  the  functions  {ffp)}  and  the  associated 
policies  will  be  found  in  [l]- 

In  this  way,  the  calculus  of  variations  is  seen  to  be  a 
part  of  an  extension  of  the  classical  theory  of  iteration,  and 
of  semi-group  theory. 
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(.  Stochastic  Elements 

In  order  to  treat  questions  arising  In  the  physical  world 
In  precise  fashion.  It  is  always  necessaiv  to  make  certain 
Idealizations.  Foremost  among  these  Is  the  assumption  of 
known  cause  and  effect,  and,  perhaps,  even  that  of  cause  and 
effect  In  Itself. 

To  treat  physical  processes  in  a  more  realistic  way,  we 
must  take  into  account  xinknown  causes  and  unknown  effects.  We 
find  ourselves  in  the  Ironioal  position  of  making  precise  what 
we  mean  by  Ignorance. 

At  the  present  time,  there  exist  a  number  of  approaches 
to  this  fundaunental  conundrum,  all  based  upon  the  concept  of  a 
random  variable.  Building  upon  this  foundation  Is  the  theory 
of  games. 

We  shall  discuss  here  only  the  direct  application  of  the 
concept  of  stochastic  proces3«3,  leaving  the  game  aspects  for 
a  later  date. 

The  theory  of  probability  in  a  most  Ingenious  fashion 
skirts  the  forbidden  region  of  the  unknown  by  ascribing  to  an 
vinknown  quantity  a  distribution  of  values  according  to  certain 
law.  Having  taken  this  bold  Otep,  it  i»  further  agreed  that 
we  shall  measure  performance  not  in  terms  of  a  single  outcome, 
but  in  tenns  of  an  average  taken  over  thio  distribution  of 
values.  Needless  to  add,  this  artifice  has  been  amazingly 
successful  in  the  analysis  of  physical  processes;  e.g. 
statistical  mechanics,  quantum  mechanics. 
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Followlng  this  line  of  thought,  we  begin  to  take  account 
of  unknown  effects  by  supposing  that  the  result  of  a  decision 
q  is  not  to  transform  p  into  a  fixed  state  T(p,q),  but 
rather  to  trajisfom  p  into  a  stochastic  vector  z  whose 
distribution  fuiictlon  is  dG(z,t>,q),  dependent  upon  both  the 
initial  vector  p  and  the  decision  q.  Let  us  further  suppose 
that  the  purpose  of  the  process  is  to  maximize  the  expected 
value  of  a  preasslgned  function^  of  the  final  state  of  the 

system. 

Before  setting  up  the  functional  equation  analogous  to 
(6.3),  let  us  review  the  course  of  the  process.  At  the  Initial 
time,  an  initial  decision  q^^  Is  made,  v;ith  the  result  that 
there  is  a  new  state  p^^,  which  is  observed.  On  the  basis  of 
this  information,  a  new  decision,  q^,  is  made,  and  so  on. 

It  is  important  to  emphasise  the  great  difference  between 
a  feedback  control  process  of  this  type,  in  which  the  q^^  are 
chosen  stage-by-stage,  and  a  pixjcess  in  which  the  q^  are 
chosen  all  at  once  at  some  Initial  time. 

In  the  deterministic  case,  the  two  processes  are  equiva¬ 
lent,  and  it  is  only  a  matter  of  convenience  whether  we  use  one 

•» 

or  the  other  formulation.  In  the  stochastic  case,  the  two 
processes  are  equivalent  only  in  certain  special  situations. 

We  shall  be  concerned  here  only  with  the  stage-by-stage  choice. 

The  analogue  of  (6.4)  is  then 

♦ 

This  corresponds  to  the  choice  we  have  of  describing  a 
curve  as  a  locus  of  points  or  as  an  envelope  of  tangents. 


P-1416 
Revised  2-6-59 
-10- 

(1)  f^Cp)  =  Maxy’5^t2)dG(z,p,q), 

q  z 

and  that  of  (6.3)  is 

(2)  ""  Max^f  ^(z)dG(z,p,q),  N  =  2,3,...  .* 

q  z 

This  type  of  process  has  been  discussed  in  some  detail 
in  [l], 

6.  Second  Level  Processes 

Fortunately  for  the  mathematician  interested  in  these 
processes,  the  tale  does  not  end  here  I  It  turns  out  to  be  the 
case  that  in  a  number  of  significant  applications,  it  cannot 
be  safely  assumed  that  the  unknown  quantities  possess  known 
distribution  functions. 

In  many  cases,  we  must  face  the  fact  that  we  are  dealing 
with  more  complex  situations  in  which  far  less  1»  known  about 
the  unknown  quantities.  For  a  discussion  of  the  Importance 
of  these  processes  in  the  general  theory  of  design  and  control, 
see  McMillan,  9j;  for  a  discussion  of  the  daumgers  and  diffi¬ 
culties  inherent  in  any  mathematical  treatment,  see  Zadeh, 

[is]. 

A  first  attempt  in  salvaging  much  of  the  structure  al¬ 
ready  erected  is  to  assume  that  the  unknown  quantities  possess 

* 

The  descriptive  version  of  this  equation,  when  no  con¬ 
trol  is  exerted,  is,  of  course,  the  Chapman- Kolmogoroff 
equation,  the  stochastic  analogue  of  (5.2). 
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flxed,  but  unknown,  distribution  functions.  Regarding  deter¬ 
ministic  processes  as  those  of  zeroeth-level,  and  the  stoch¬ 
astic  processes  described  In  ^7  as  first-level  processes,  we 
shall  refer  to  these  new  stochastic  processes  as  second-level 
processes. 

Although  It  Is  clear  that  we  now  possess  a  systematic 
method  for  constructing  a  hierarchy  of  mathematical  models,  we 
shall  restrain  ourselves  in  the  remainder  of  this  paper  to  the 
discussion  of  second-level  processes. 

9.  Additional  Assumptions 

Some  further  assumptions  are  required  If  we  wish  to  pro¬ 
ceed  from  this  point  to  an  analytic  treatment.  These  are 

I.  We  possess  an  a  priori  estimate  for  the  distribution 
function  governing  the  physical  state  of  the  system, 
which,  until  further  knowledge  is  acquired,  we  regard 
as  the  actual  distribution. 

II.  We  possess  a  set  of  rules  which  tells  us  how  to  modi¬ 
fy  this  a  priori  distribution  so  as  to  obtain  an  a 
posteriori  distribution  when  addltloiial  informaation 
Is  obtained. 

III.  We  possess  an  a  priori  estimate  for  the  distribution 
functions  governing  the  outcomes  of  decisions,  which, 
until  further  knowledge  is  acquired,  we  regard  as  the 
actual  distribution,  and,  as  above,  we  know  how  to 
modify  this  in  the  light  of  subsequent  information. 
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In  this  paper,  we  restrict  ourselves  to  the  case  of  known 
physical  states. 

In  formal  terms,  our  state  vector  Is  now  compounded  of  a 
point  In  phase  space,  p,  and  an  Information  pattern, 
dG(z,p,q).  As  a  result  of  a  decision  q^,  there  result  the 
transformations 

(1)  Pq  — ^  ^1  (observed) 

dG(z,p*,q)-^  dH(z,p*,q;pQ,G,q^,p^ )  (hypothesized). 

On  the  basis  of  these  assumptions,  and  considering  a  con¬ 
trol  prDcess  which  continues  In  time  as  described  In 
wish  to  pose  the  problem  of  determining  optimai  policies.  For 
the  first  time,  we  are  coweiderlng  adaptive  pi^cessee  signifi¬ 
cantly  different  fnom  those  of  the  usual  deterministic  or 
stochastic  control  pivcess. 

10.  Functional ^Equations  for  Second-level  Processes 
As  before,  we  Introduce  the  function 

(1)  fj^(p;G(z,p*,q) )  =  the  expected  value  of  ^(pj^,Gj^) 

obtained  using  an  optimal  policy 
for  an  N- stage  pixjcess  starting 
In  state  (p,G). 

Depending  upon  the  objectives  of  the  process,  only  one  or  the 
other  of  p^^  and  may  enter  Into  Examples  of  both 


extremes  abound. 
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Arguing  as  in  the  preceding  sections,  we  see  that  the 
basic  recurrence  relation  Is 

(2)  fj^(p;G(z,p*,q)) 

=  Max^  f  (w;H(z,p*,q;p,G,q^,w)  )dO(w,p,q.  ), 
q^  w 

for  N  =  2,3,.. -1  with 

(3)  f2^(p;G(z,p*,q)) 

=  Maxy^p(w,H(z,p*,q;p,G,q^,w)  )dG(w,p,q^). 

^1 

These  equations  are  quite  useful  In  the  derivation  of 
existence  and  uniqueness  theorems  concerning  optimal  policies, 
return  functions,  and  in  ascertaining  certain  structural 
properties  of  optimal  policies;  cf.  [l]»  [^j* 

If,  however,  we  treat  processes  which  are  too  complex  for 
a  direct  analytic  approach,  as  is  Invariably  the  case  for 
realistic  models,  we  wish  to  be  able  to  fall  back  upon  a  compu¬ 
tational  solution.  The  occurrence  of  fxinctlons  of  functions, 
e.g.  the  sequence  ^fj^(p;G)j  ,  effectively  prevents  this. 

11.  Further  Structural  Assumptions 

In  order  to  i*educe  the  foregoing  equations  to  more 
manageable  form,  let  us  assvune  that  the  structure  of  the  actual 
distribution  is  known,  but  that  the  uncertainty  arises  with 
regard  to  the  values  of  certain  parameters. 
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At  any  stage  of  the  process,  in  place  of  an  a  priori 
estimate,  G(z,p,q),  for  the  distribution  function,  we  suppose 
that  we  have  an  a  priori  estimate  for  the  distribution  fvmctlon 
governing  the  unknown  parameters.  Again,  a  basic  assumption  Is 
that  this  distribution  function  exists. 

The  functional  equations  that  we  derive  are  exactly  as 
above,  with  the  difference  In  meaning  of  the  distribution 
functions  that  we  have  just  described. 

12.  Reduction  from  Functionals  to  Functions 

We  are  now  ready  to  take  the  decisive  step  of  reducing 
fj^(p,G)  from  a  functional  to  a  function. 

It  may  happen,  and  v/e  will  give  an  example  in  a  moment, 
that  the  change  In  the  distribution  function,  from  G(z,p,q) 
to  H(z,p*,q;p,G,q^,w)  Is  one  that  can  be  represented  by  a 
point  transfomatlon .  This  will  be  the  case  If  G  and  H  are 
both  members  of  a  family  of  distribution  functions  K(z;a) 
characterized  by  a  vector  parameter  a.  Thus,  If 

(1)  G(z,p,q)  s  K(z,p,q;a) 

H(z,p*,q;p,G,q^,w)  s  K(z,p,q;3), 

the  change  from  G  to  H  may  be  represented  by 

(2)  p  =  Y(p,a,qj^,w) . 


Then  we  may  write 
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(3)  fjj(p,G(z,p,q) )  =  fjj(p;a), 

and  (10.2)  becomes 

(4)  fjj(pja)  =  Maxy’  f^^(w;3)dK(w;a). 

Ql  w 

The  dependence  upon  Is  by  way  of  (2). 

13.  An  Illustrative  Process— Deterministic  Version 

Let  us  now  show  how  these  ideas  may  be  applied  to  the 
study  of  control  processes.  Consider  a  discrete  scalar  recur¬ 
rence  relation  of  the  form 


(1) 


u 


n+1 


=  au^  +  V. 


n^ 


u 


0 


=  c 


Here  u_  Is  the  state  variable  and  v  is  the  control  varl- 
n  n 

able.  Suppose  that  the  sequence  {^n}  chosen  to 

minimize  the  function 

(2)  lu  1  +  b  Z  uy 

^  k=l  ^ 

subject  to  the  constraints 

(3)  Iv^l  <  r,  1  =  0,  . .  .,N  -  1. 


Although  the  precise  analytic  fom  of  the  criterion 
function  is  of  little  Import  as  far  as  the  present  discussion 
is  concerned,  we  have  used  specific  functions  to  make  the 
presentation  as  concrete  as  possible.  Furthermore,  the  defining 
equation  need  not  be  linear. 
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Thls  Is  a  simple  example  of  a  determalnlstlc  control  pro¬ 
cess.  Introduce  the  sequence  of  functions  defined  by  the 
relation 

(4)  f  (c)  =  Min 

fi} 

where  N  takes  on  the  values  1,2,...,  and  c  any  real 
value . 

Then 

(5)  f.Co)  -  Min  [|ao  +  v  |  +  b(ae  +  v„)^], 

1  |Vol<b  1-  °  °  J 


and  for  N  ^  2,  the  principle  of  optimality  yields  the 
relation 


r,(c)  =  Min 


|Vnl<^  L 


r  2 

jb(ac  +  v^)  +  +  v^) 


l4.  Stochastic  Version 

In  place  of  the  recurrence  relation  of  (13.I),  let  us 
introduce  a  stochastic  transformation 


(1) 


+  ^n  ^n^ 


Here  ^  sequence  of  Independent  random  variables 

assuming  only  the  values  1  and  0.  Let 


(2)  r^^  =  1  with  probability  p 


=  0  with  probability  1  —  p. 
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The  quantity  p  Is  known,  and  for  simplicity  taken  to  be  inde¬ 
pendent  of  n,  although  this  is  not  necessary. 

We  now  wish  to  minimize  the  expected  value  of  the  quantity 
appearing  in  (13.2).  This  is  now  a  stochastic  control  process 
of  the  type  described  above  in  general  tenns.  Call  the  minimum 
expected  value  Then,  following  the  pixjcedures  of  ^7, 

we  have  the  relations 


(3)  f,(o)  •  Win 


Min 
I  Vgl  <r 


lao  +  V.  +  r.|  +  b(ac  +  r.  +  v  )' 


dO(rg) 


I  ac  +  Vq  +  1  +  b(ac  +  +  1)' 


+  (1  -  p) 


ac  +  VqI  +  b(ac  + 


V 


aiid,  for  general  N, 


(4)  f  (o)  =  Min 

I  ^ol 


P 


b(ac  +  v^  +  l)'^  +  fj^j^(ac  +  v^  +  1 ) 


0 


+  (1  -  P) 


b(ac  f  Vq)^  + 


15.  Adaptive  Control  Version 

Let  us  now  consider  the  adaptive  control  version.  We  are 
given  the  information  that  the  random  variables  r^  possess 
distributions  of  the  special  type  described  above,  but  we  do 
not  know  the  precise  value  of  p. 

We  shall  assume,  however,  that  we  do  possess  an  a  priori 
distribution  for  the  value  of  p,  dG(p),  and  that  we  possess 
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a  known  rule  for  modifying  this  a  priori  distribution  on  the 
basis  of  the  observations  that  are  made  as  the  process  unfolds. 

If  we  observe  that  over  the  past  m  +  n  stages,  the 
random  variables  have  taken  on  m  values  of  1  and  n 
values  of  0,  we  take  as  our  new  a  priori  distribution  the 
function 

~  p)^dG(p)/^'^^  p^(l  -  p)“dG(p), 

a  Bayes  approach.* 

Once  we  have  fixed  upon  a  choice  of  G(p),  the  a  priori 
distribution  function  at  any  stage  of  the  process  Is  uniquely 
determined,  from  the  foregoing,  by  the  numbers  m  and  n. 

This  simple  observation  enables  us  to  reduce  the  information 
pattern  from  that  of  the  specification  of  a  number,  or  vector. 
In  general,  plus  a  function  G^^  that  of  the  specifi¬ 

cation  of  three  numbers,  c  and  the  two  integers  m  and  n. 

In  thj 3  way,  we  reduce  the  problem  from  one  requiring  the 
use  of  functionals  to  one  utilizing  only  functions.  This  is 
an  essential  step  not  only  for  computational  purposes,  but  for 
analytic  purposes  as  well. 

Let  us  then  introduce  the  sequence  of  functions 
|fj^(c,m,n)j  defined  once  again  as  the  minimum  expected  value 
of  the  quantity  In  (13*2),  starting  with  the  infonnatlon 
pattern  of  m  ones  and  n  zeros,  and  state  c. 

Then 

- ^ - 

This  la  an  assumption  of  the  type  called  for  in  &9. 
Although  reasonable,  it  is  not  the  only  one  possible.  ^There 
are  analytical  advantages  in  choosing  G  to  be  a  beta 
distribution. 


In  this  fashion,  we  obtain  a  computational  approach  to 
processes  with  general  criteria  and  an  analytic  approach  to 
processes  with  criteria  of  particular  type.  A  thoroughgoing 
discussion  of  the  analytic  aspects  of  the  solution  of  processes 
of  this  nature  described  by  linear  equations  and  quadratic 
criteria  will  be  found  in  a  forthcoming  doctoral  thesis  by 


Marshall  Frelmer. 
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Prevlous  applications  of  these  techniques  may  be  found  In 
[2]  and  [3] . 

16.  Sufficient  Statistics 

The  fact  that  the  past  histoiv  of  the  process  described 
in  the  preceding  paragraphs  can  be  compressed  in  the  indicated 
fashion,  so  that  functions  rather  than  functionals  occur,  is  a 
particular  Instance  of  the  power  of  the  theory  of  "sufficient 
statistics;"  cf.  Mood,  lo] . 

Many  further  applications  of  this  Important  concept  will 
be  found  in  the  thesis  of  Freimer  mentioned  above. 

In  a  number  of  cases,  this  compression  of  data  occurs 
asymptotically  as  the  pjx)ce83  continues;  e.g.  the  central 
limit  theorem.  A  number  of  quite  interesting  questions  arise 
fixim  this  observation. 

17.  Discussion 

In  the  foregoing  pages,  we  have  attempted  to  construct  a 
mathematical  foundation  for  the  study  of  the  many  fascinating 
aspects  of  the  field  of  adaptive  control.  In  further  papers, 
we  shall  discuss  a  number  of  complex  problems  which  arise  from 
this  approach. 

From  the  purely  mathematical  point  of  view,  we  are  now 
able  to  contemplate  a  theory  of  continuous  control  pTOcesses  of 
adaptive  type,  obtained  as  a  limiting  form  of  the  theory  of 
discrete  contrxDl  processes.  A  variety  of  significant  conver¬ 
gence  questions  are  encountered  in  this  way. 
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Furthermore,  we  can  on  the  same  foundations  construct  a 
theory  of  multi-stage. games . 

Finally,  the  problem  of  computational  solution  Is  by  no 
means  routine,  and  there  are  a  variety  of  Interesting  approaches 
based  upon  approximations  in  function  space  and  approximations 
in  policy  space  to  be  explored. 

From  the  conceptual  point  of  view,  we  must  face  the  fact 
that  there  are  many  further  uncertainties  to  be  examined.  In 
the  state  of  the  system,  in  the  observation  of  the  random 
effect,  In  the  transmission  of  the  control  signal,  in  the 
duration  of  the  process,  and  even  in  the  criterion  function 


itself. 
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